68 research outputs found

    Evaluation of IVR data collection UIs for untrained rural users

    Get PDF
    Due to the rapid spread of mobile phones and coverage in the developing world, mobile phones are being increasingly used as a technology platform for developing-world applications including data collection. In order to reach the vast majority of mobile phone users without access to specialized software, applications must make use of interactive voice response (IVR) UIs. However, it is unclear whether rural users in the developing world can use such UIs without prior training or IVR experience; and if so, what UI design choices improve usability for these target populations. This paper presents the results of a real-world deployment of an IVR application for collecting feedback from teachers in rural Uganda. Automated IVR data collection calls were delivered to over 150 teachers over a period of several months. Modifications were made to the IVR interface throughout the study period in response to user interviews and recorded transcripts of survey calls. Significant differences in task success rate were observed for different interface designs (from 0% to over 75% success). Notably, most participants were not able to use a touchtone or touchtone-voice hybrid interface without prior training. A set of design recommendations is proposed based on the performance of several tested interface designs

    Transparent dynamic instrumentation

    Get PDF
    Process virtualization provides a virtual execution environment within which an unmodified application can be monitored and controlled while it executes. The provided layer of control can be used for purposes ranging from sandboxing to compatibility to profiling. The additional operations required for this layer are performed clandestinely alongside regular program execution. Software dynamic instrumentation is one method for implementing process virtualization which dynamically instruments an application such that the application's code and the inserted code are interleaved together. DynamoRIO is a process virtualization system implemented using software code cache techniques that allows users to build customized dynamic instrumentation tools. There are many challenges to building such a runtime system. One major obstacle is transparency. In order to support executing arbitrary applications, DynamoRIO must be fully transparent so that an application cannot distinguish between running inside the virtual environment and native execution. In addition, any desired extra operations for a particular tool must avoid interfering with the behavior of the application. Transparency has historically been provided on an ad-hoc basis, as a reaction to observed problems in target applications. This paper identifies a necessary set of transparency requirements for running mainstream Windows and Linux applications. We discuss possible solutions to each transparency issue, evaluate tradeoffs between different choices, and identify cases where maintaining transparency is not practically solvable. We believe this will provide a guideline for better design and implementation of transparent dynamic instrumentation, as well as other similar process virtualization systems using software code caches

    Increasing and Detecting Memory Address Congruence

    Get PDF
    A static memory reference exhibits a unique property when its dynamic memory addresses are congruent with respect to some non-trivial modulus. Extraction of this congruence information at compile-time enables new classes of program optimization. In this paper, we present methods for forcing congruence among the dynamic addresses of a memory reference. We also introduce a compiler algorithm for detecting this property. Our transformations do not require interprocedural analysis and introduce almost no overhead. As a result, they can be incorporated into real compilation systems. On average, our transformations are able to achieve a five-fold increase in the number of congruent memory operations. We are then able to detect 95% of these references. This success is invaluable in providing performance gains in a variety of areas. When congruence information is incorporated into a vectorizing compiler, we can increase the performance of a G4 AltiVec processor up to a factor of two. Using the same methods, we are able to reduce energy consumption in a data cache by as much as 35%.Singapore-MIT Alliance (SMA

    Bit-Packing Optimization for StreamIt

    Get PDF
    StreamIt is a language specifically designed for modern streaming applications. A certain important class of these applications operates on streams of bits. This paper presents the motivation for a bit-packing optimization to be implemented in the StreamIt compiler for the RAW Architecture. This technique aims to pack bits into integers so that operations can be performed on multiple bits at once thus increasing the performance of these applications considerably. This paper gives some simple example applications to illustrate the various conditions where this technique can be applied and also analyses some of its limitations.Singapore-MIT Alliance (SMA

    An efficient evolutionary algorithm for solving incrementally structured problems

    Get PDF
    Many real world problems have a structure where small problem instances are embedded within large problem instances, or where solution quality for large problem instances is loosely correlated to that of small problem instances. This structure can be exploited because smaller problem instances typically have smaller search spaces and are cheaper to evaluate. We present an evolutionary algorithm, INCREA, which is designed to incrementally solve a large, noisy, computationally expensive problem by deriving its initial population through recursively running itself on problem instances of smaller sizes. The INCREA algorithm also expands and shrinks its population each generation and cuts off work that doesn't appear to promise a fruitful result. For further efficiency, it addresses noisy solution quality efficiently by focusing on resolving it for small, potentially reusable solutions which have a much lower cost of evaluation. We compare INCREA to a general purpose evolutionary algorithm and find that in most cases INCREA arrives at the same solution in significantly less time.United States. Dept. of Energy (award DESC0005288

    Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection

    Get PDF
    EvoApplications 2012: EvoCOMNET, EvoCOMPLEX, EvoFIN, EvoGAMES, EvoHOT, EvoIASP, EvoNUM, EvoPAR, EvoRISK, EvoSTIM, and EvoSTOC, Málaga, Spain, April 11-13, 2012, ProceedingsWe are using bandit-based adaptive operator selection while autotuning parallel computer programs. The autotuning, which uses evolutionary algorithm-based stochastic sampling, takes place over an extended duration and occurs in situ as programs execute. The environment or context during tuning is either largely static in one scenario or dynamic in another. We rely upon adaptive operator selection to dynamically generate worthy test configurations of the program. In this paper, we study how the choice of hyperparameters, which control the trade-off between exploration and exploitation, affects the effectiveness of adaptive operator selection which in turn affects the performance of the autotuner. We show that while the optimal assignment of hyperparameters varies greatly between different benchmarks, there exists a single assignment, for a context, of hyperparameters that performs well regardless of the program being tuned

    Dynamic expressivity with static optimization for streaming languages

    Get PDF
    Developers increasingly use streaming languages to write applications that process large volumes of data with high throughput. Unfortunately, when picking which streaming language to use, they face a difficult choice. On the one hand, dynamically scheduled languages allow developers to write a wider range of applications, but cannot take advantage of many crucial optimizations. On the other hand, statically scheduled languages are extremely performant, but have difficulty expressing many important streaming applications. This paper presents the design of a hybrid scheduler for stream processing languages. The compiler partitions the streaming application into coarse-grained subgraphs separated by dynamic rate boundaries. It then applies static optimizations to those subgraphs. We have implemented this scheduler as an extension to the StreamIt compiler. To evaluate its performance, we compare it to three scheduling techniques used by dynamic systems (OS thread, demand, and no-op) on a combination of micro-benchmarks and real-world inspired synthetic benchmarks. Our scheduler not only allows the previously static version of StreamIt to run dynamic rate applications, but it outperforms the three dynamic alternatives. This demonstrates that our scheduler strikes the right balance between expressivity and performance for stream processing languages.National Science Foundation (U.S.) (CCF-1162444

    StreamIt: A Language and Compiler for Communication-Exposed Architectures

    Get PDF
    With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, SmartMemories, TRIPS). However, for their use to be widespread, it will be necesary to develop a common machine language to allow programmers to express an algorithm in a way that can be efficiently mapped across these architectures. We propose a new common machine language for grid-based software-exposed architectures: StreamIt. StreamIt is a high-level programming language with explicit support for streaming computation. Unlike sequential programs with obscured dependence information and complex communication patterns, a stream program is naturally written as a set of concurrent filters with regular steady-state communication. The language imposes a hierarchical structure on the stream graph that enables novel representations and optimizations within the StreamIt compiler. We have implemented a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations. Though StreamIt exposes the parallelism and communication patterns of stream programs, analysis is needed to adapt a stream program to a software-exposed processor. We describe a partitioning algorithm that employs fission and fusion transformations to adjust the granularity of a stream graph, a layout algorithm that maps a stream graph to a given network topology, and a scheduling strategy that generates a fine-grained static communication pattern for each computational element. Using the cycle-accurate Raw simulator, we demonstrate that the StreamIt compiler can automatically map a high-level stream abstraction to Raw. We consider this work to be a first step towards a portable programming model for communication-exposed architectures.Singapore-MIT Alliance (SMA

    How to Do a Million Watchpoints: Efficient Debugging Using Dynamic Instrumentation

    Get PDF
    Application debugging is a tedious but inevitable chore in any software development project. An effective debugger can make programmers more productive by allowing them to pause execution and inspect the state of the process, or monitor writes to memory to detect data corruption. The latter is a notoriously difficult category of bugs to diagnose and repair especially in pointer-heavy applications. The debugging challenges will increase with the arrival of multicore processors which require explicit parallelization of the user code to get any performance gains. Parallelization in turn can lead to more data debugging issues such as the detection of data races between threads. This paper leverages the increasing efficiency of runtime binary interpreters to provide a new concept of Efficient Debugging using Dynamic Instrumentation, or EDDI. The paper demonstrates for the first time the feasibility of using dynamic instrumentation on demand to accelerate software debuggers, especially when the available hardware support is lacking or inadequate. As an example, EDDI can simultaneously monitor millions of memory locations, without crippling the host processing platform. It does this in software and hence provides a portable debugging environment. It is also well suited for interactive debugging because of the low associated overheads. EDDI provides a scalable and extensible debugging framework that can substantially increase the feature set of standard off the shelf debuggers.Singapore-MIT Alliance (SMA

    Autotuning multigrid with PetaBricks

    Get PDF
    Algorithmic choice is essential in any problem domain to realizing optimal computational performance. Multigrid is a prime example: not only is it possible to make choices at the highest grid resolution, but a program can switch techniques as the problem is recursively attacked on coarser grid levels to take advantage of algorithms with different scaling behaviors. Additionally, users with different convergence criteria must experiment with parameters to yield a tuned algorithm that meets their accuracy requirements. Even after a tuned algorithm has been found, users often have to start all over when migrating from one machine to another. We present an algorithm and autotuning methodology that address these issues in a near-optimal and efficient manner. The freedom of independently tuning both the algorithm and the number of iterations at each recursion level results in an exponential search space of tuned algorithms that have different accuracies and performances. To search this space efficiently, our autotuner utilizes a novel dynamic programming method to build efficient tuned algorithms from the bottom up. The results are customized multigrid algorithms that invest targeted computational power to yield the accuracy required by the user. The techniques we describe allow the user to automatically generate tuned multigrid cycles of different shapes targeted to the user's specific combination of problem, hardware, and accuracy requirements. These cycle shapes dictate the order in which grid coarsening and grid refinement are interleaved with both iterative methods, such as Jacobi or Successive Over-Relaxation, as well as direct methods, which tend to have superior performance for small problem sizes. The need to make choices between all of these methods brings the issue of variable accuracy to the forefront. Not only must the autotuning framework compare different possible multigrid cycle shapes against each other, but it also needs the ability to compare tuned cycles against both direct and (non-multigrid) iterative methods. We address this problem by using an accuracy metric for measuring the effectiveness of tuned cycle shapes and making comparisons over all algorithmic types based on this common yardstick. In our results, we find that the flexibility to trade performance versus accuracy at all levels of recursive computation enables us to achieve excellent performance on a variety of platforms compared to algorithmically static implementations of multigrid. Our implementation uses PetaBricks, an implicitly parallel programming language where algorithmic choices are exposed in the language. The PetaBricks compiler uses these choices to analyze, autotune, and verify the PetaBricks program. These language features, most notably the autotuner, were key in enabling our implementation to be clear, correct, and fast.National Science Foundation (U.S.) (Award CCF-0832997)GigaScale Systems Research Cente
    corecore